DESMATICS  INC  STATE  COLLEGE  PA  F/G  12/1 

OPTIMAL  AUGMENTATION  OF  EXPERIMENTAL  DESIGNS  FOR  ESTIMATION  OF  — CTC(U> 
MAY  80  L  A  KALISHr  D  E  SMITH  N00014-79-C-0128 

TR-I12-5  NL 


UNCLASSIFIED 


,  INC. 


P.  O.  Box  618 

State  College,  Pa.  16801 

Phone:  (814)  238-9621 


Applied  Research  in  Statistics  -  Mathematics  -  Operations  Research 


OPTIMAL  AUGMENTATION  OP  EXPERIMENTAL 
DESIGNS  POR  ESTIMATION 
OF  THE  LOGISTIC  FUNCTION 


Leslie  A.  Kalish 
and 

Dennis  E.  Smith 


o1&! 

s'.' 


v  •  y 


TECHNICAL  REPORT  NO.  112-5 


May  1980 


This  study  was  supported  by  the  Office  of  Naval  Research 
under  Contract  No.  N00014-79-C-0128,  Task  No.  NR  207-037 
and  Contract  No.  N00014-75-C-1054,  Task  No.  NR  042-334 

Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government 

Approved  for  public  release;  distribution  unlimited 


TABLE  OF  CONTENTS 


I.  INTRODUCTION  .  1 

II.  BACKGROUND . 2 

III.  AUGMENTATION  OF  EXPERIMENTAL  DATA . 5 

B.  IMPLEMENTATION  PROBLEMS  .  6 

C.  EFFICIENCY . 7 

IV.  SIMULATION . 9 

A.  PROCEDURE . 9 

B.  RESULTS . 12 

V.  EXAMPLES . 20 

A.  TWO-PARAMETER  CASE . 20 

B.  THREE-PARAMETER  CASE . 22 

VI.  SUMMARY . 25 

VII.  REFERENCES . 26 


li 


I.  INTRODUCTION 


Two  previous  Desmatics  technical  reports  [5,  9]  discussed  optimal 
designs  for  the  estimation  of  the  two-parameter  logistic  function  and 
estimation  accuracy  respectively.  Another  report  [7]  discussed  the  use 
of  the  logistic  function  for  prediction  of  impact  acceleration  injury. 

In  two  other  reports  [8,  10]  injury  prediction  models  were  constructed 
from  a  set  of  twenty-eight  -G  accelerator  runs  with  Rhesus  monkeys  as 
subjects.  It  was  suggested  that  additional  runs  be  made  in  order  to 
produce  more  reliable  results. 

This  report  discusses  the  problem  of  how  to  specify  additional  runs 
optimally.  After  a  criterion  for  optimal  augmentation  of  experimental 
data  is  presented,  practical  problems  in  its  implementation  are  discussed. 
A  simulation  study  for  evaluating  the  augmentation  procedure  for  the  two- 
parameter  case  is  described.  Examples  using  accelerator  data  from  [8] 
and  [10]  are  given. 


-1- 


II.  BACKGROUND 


The  model  being  used  to  predict  injury  is  the  logistic  model,  which 
has  the  form 

P±  =  f(xif  6)  =  [1  +  exp(-xjj3)]_1  (1) 

where  ^  -  (1,  x.^  x2,...,  xfc)  1  and  jJ  -  (8Q,  6]_,  $2 .  3^'.  pi  denotes 

the  probability  of  injury  on  the  i~  accelerator  run,  Ms  a  vector  of 
parameter  values  and  the  vector  x.^  gives  the  values  of  the  independent 
(predictor)  variables  on  the  i—  run.  The  first  element  in  x^,  a  "dummy 
variable,"  is  included  to  provide  for  the  estimation  of  an  intercept.  The 
i—  observed  probability,  denoted  is  given  by 

Pi  ■  pi  +  e±  -  1)  +  t± 

where  denotes  the  error  term.  Because  of  the  binomial  response  (injury 
or  noninjury),  p^  is  either  1  or  0. 

It  Is  appropriate  to  use  weighted  least  squares  to  estimate  It 
can  be  shown  that  the  weighted  least  squares  estimate  of  J!  is  given  by 

b  -  (X'ffiO-1X'H£  (2) 

where  Jb  denotes  the  estimated  parameter  vector 

A  A  A  A 

h.  "  (Bq»  3^*  B2,...»  3k)’  ; 

X  denotes  the  design  matrix  (for  an  n  run  design) 


-2- 


H  denotes  the  diagonal  weight  matrix 


p2Q2 


H  = 


PO 

nn 


(where  Q  -  1  -  P^);  and  denotes  the  vector  of  "working  observations" 

z  *  (yi»  y?**-**  y  >' 

*  *■  n 

where  *  (1/P^Q^Mp^  -  P^  +  P^Q^  In (P^/Q^)  ] .  For  more  details  on  the 
estimation  of  J3>  see  Kalish  and  Smith  [5],  Walker  and  Duncan  [11],  or 
Smith  [7]. 

The  asymptotic  covariance  matrix  of  is 

Var (b)  -  (X’HX)"1  (3) 

In  practical  applications,  the  covariance  matrix  must  be  estimated  by 


-3- 


substituting 


for  H  in  (3),  where  P±  =  [1  +  exp(-xjb)]-1  and  Q±  =  1  -  P±. 

For  the  two-parameter  case  (k  =  1),  equation  (1)  is  a  sigmoid 
curve  of  the  form 

P  =  {1  +  exp[-(0Q  +  gjx)]}-1.  (4) 

LDioop  is  defined  to  be  the  value  of  variable  x  which  results  in  a 
probability  P  of  response;  that  is,  P  =  (1  +  exp{-I0Q  +  0.^  (LD^gQp  )  ]  }  ) 


III.  AUGMENTATION  OF  EXPERIMENTAL  DATA 

In  general,  an  optimal  experimental  design  is  one  which  "minimizes” 
the  covariance  matrix  (X'HX)  \  The  meaning  of  the  word  minimize,  when 
applied  to  a  matrix,  is  not  obvious.  A  number  of  functionals  of  a 
covariance  matrix  have  been  proposed  as  criteria  for  minimization;  for 
each  criterion  there  is  a  corresponding  optimal  design.  (See  Keifer  [6].) 

In  Kalish  and  Smith  [5]  optimal  designs  for  the  two-parameter  logistic 
function  are  constructed  using  four  such  criteria.  Although  the  topic 
being  addressed  in  this  report  is  not  the  construction  of  optimal  designs, 
it  will  be  seen  that  the  optimal  augmentation  of  an  existing  design  is 
closely  related  to  optimal  design  methodology. 

A.  OPTIMALITY  CRITERION 

Perhaps  the  most  widely  used  criterion  for  the  construction  of  optimal 
designs  is  D-optimality .  An  n-point  design  is  said  to  be  D-optimal  if  the 
determinant  of  (X'HX)  ,  denoted  | (X'HX)  ^ |  is  minimized  for  all  n-point 
designs.  Since  X'HX,  often  called  the  "information  matrix,"  has  determinant 
|X'HXJ  =  1/ | (X'HX)"1 |,  the  criterion  can  equivalently  be  expressed  as  maxi¬ 
mizing  |X'HX|. 

Dykstra  [3]  utilized  the  notion  of  D-optimality  in  augmenting  an  existing 
n-point  design  by  specifying  the  (n  +  1)—  point  as  the  one  which  yields  the 
largest  increase  in  the  determinant  of  the  information  matrix.  (Dykstra 
expresses  the  information  matrix  as  X'X  because  his  work  is  based  on  the 
assumption  of  ordinary  least  squares  estimation,  where  the  weight 


-5- 


matrix  equals  the  identity  matrix.  Since  it  is  appropriate  to  use  weighted 
least  squares  estimation  with  the  logistic  function,  the  augmentation 
criterion  used  here  is  to  maximize  the  increase  in  |X']DC|.)  By  subsequently 
considering  the  newly  added  point  as  part  of  the  existing  design,  the  process 
can  be  repeated  to  specify  additional  points. 


B.  IMPLEMENTATION  PROBLEMS 

One  problem  with  the  implementation  of  the  augmentation  criterion  is 
that  the  level  of  the  predictor  variables  must  be  controllable  to  some 
acceptable  degree  of  accuracy  in  order  to  be  able  to  make  observations  at 
the  specified  optimal  new  design  points.  A  different  sort  of  problem  that 
can  occur  when  k  >  1  (that  is,  when  there  is  more  than  one  predictor  variable) 
is  that  the  optimal  new  design  point  may  diverge  to  infinite  values.  In 
this  -.ase,  the  experimental  region  must  be  arbitrarily  constrained  to  insure 
that  the  augmented  design  points  have  reasonable  values.  An  example  of  this 
is  given  in  Section  V.B. 

The  particular  form  of  the  weight  matrix  gives  rise  to  a  third  problem: 
the  fact  that  H  is  a  function  of  the  parameters,  which  are  not  known  at  the 
time  of  experimentation,  results  in  a  "Catch-22"  wherein  it  is  necessary  to 
know  the  parameter  values  in  order  to  decide  how  best  to  estimate  them! 

A  practical  solution  to  this  dilemma  is  to  use  the  existing  experimental 
data  to  estimate  J3  and  then  to  use  the  estimate,  _b,  to  specify  an  approximately 
optimal  new  design  point.  Such  a  procedure  will  be  referred  as  an  "approxi¬ 
mate  procedure"  as  opposed  to  the  "exact  procedure"  which  is  only  hypothetically 
possible. 

Of  course,  it  is  best  to  update  the  parameter  estimates  before  each 


-6- 


additional  point  is  specified.  However,  it  is  often  difficult,  if  not 
impossible,  to  do  this  because  of  time  and  cost  constraints  or  because  of 
the  nature  of  the  experiment.  In  this  case,  more  than  one  additional 
point  can  be  specified  using  the  same  estimate  of  J3  to  estimate  the  infor¬ 
mation  matrix  at  each  stage.  That  is,  the  information  matrix  can  be 
reestimated  after  each  new  design  point  is  specified  without  reestimating 
the  parameters.  After,  say,  five  new  design  points  are  specified  in  this 
way,  the  five  new  runs  can  be  made  and  the  new  data  used  to  update  the 
estimates  of  for  specifying  the  next  "block"  of  five  points. 

Note  that  within  each  block,  the  five  new  points  are  added  one  at  a 
time;  there  may  be  a  different  set  of  five  new  points  which  would  yield  a 
larger  increase  in  |X'HX| .  Unfortunately,  the  calculations  involved  in 
maximizing  the  increase  in  |x'HX|  with  respect  to  more  than  one  new  point 
quickly  become  intractable  and  the  cost  involved  in  a  computerized  numerical 
search  approach  is  prohibitive. 

C.  EFFICIENCY 

A  measure  of  the  amount  of  information  in  a  design  which  does  not  depend 

on  the  number  of  points  in  the  design,  n,  is  the  normalized  determinant, 

[l/n^  +  ^ee  Draper  and  St.  John  [2].)  For  example,  when  k  =■  1 

D-optimal  designs  have  [1/n  ] |x'HX|  =  [P’Q'  ln^'/Q')/^]  ,  where 

P'  -  .824,  or  [l/n2]|x'HX]  =  .0501185/g2.  (See  Kalish  and  Smith  [5].)  For 

the  remainder  of  this  section,  only  the  one  variable  case  will  be  discussed. 

By  dividing  the  value  of  the  normalized  determinant  for  any  particular  design 
2 

by  .0501185/g^,  the  efficiency  relative  to  D-optimality  is  calculated.  This 
will  be  called  D-ef f iciency .  (See  Draper  and  St.  John  [1].) 


-7- 


Consider  the  problem  of  adding  an  n£Jl  point  to  an  existing  (n  -  l)-point 

design.  If  the  parameter  values  were  known,  the  n£il  point  could  be  specified 

using  the  exact  augmentation  procedure  and  the  D-efficiency  of  the  new  design 

could  be  calculated.  Denote  the  resulting  efficiency  DEn.  In  practice  only 

an  approximately  optimal  n£ll  point  can  be  specified.  Letting  DA  denote  the 

n 

D-efficiency  after  adding  the  approximately  optimal  n£]l  point,  a  plot  of  DA^ 
versus  n  would  illustrate  the  progress  of  an  experimental  design  in  achieving 
D-optimality.  The  ratio  =  DAn/DEn>  to  be  called  relative  efficiency, 
plotted  against  n  would  illustrate  the  progress  of  the  approximate  procedure 
in  approaching  the  exact  procedure. 

Unfortunately,  it  is  impossible  to  calculate  any  of  these  efficiencies 
in  a  practical  situation  since  the  true  parameter  values  are  not  known. 
Therefore  a  simulation  was  performed  to  give  a  typical  picture  of  what  might 
be  expected  in  an  actual  experiment. 


-8- 


IV.  SIMULATION 


A  simulation  study  was  conducted  to  evaluate  the  augmentation  procedure 
under  a  variety  of  conditions  in  the  one  variable  case.  Six  separate 
simulations  (Al,  A2,  Bl,  B2,  Cl,  C2)  were  run.  In  simulations  A1  and  A2, 
the  initial  design  contained  25  points  scattered  between  LD^  and  LD73  while 
in  simulations  Cl  and  C2,  the  initial  design  contained  25  points,  20  of  which 
were  below  LD^,.  or  above  LD^*  Simulations  Bl  and  B2  represented  a  compromise 
between  the  A's  and  the  C*s. 

The  values  of  for  all  three  initial  designs  were  the  same,  but  their 
LD  levels  were  controlled  by  changing  the  assumed  parameter  values.  In  Figure 
1,  the  values  of  x.^  are  shown  along  with  their  corresponding  probabilities, 

P^,  for  each  set  of  parameters.  (The  P^s  are  calculated  from  model  equation 
(1)  or  (4).)  The  three  assumed  parameter  vectors  are  also  shown  in  the  figure. 

A.  PROCEDURE 

In  simulations  Al,  Bl  and  Cl,  new  points  were  added  in  blocks  of  one 
(i.e.,  the  parameters  were  reestimated  after  each  new  observation  was  speci¬ 
fied  and  simulated),  while  in  simulations  A2,  B2  and  C2,  the  points  were 
added  in  blocks  of  five. 

Following  is  an  outline  of  the  simulation  procedure: 

(i)  For  each  point  in  the  initial  design,  an  observation  was  simulated 
by  generating  a  uniform  random  number,  r^,  in  the  interval  (0,  1). 
Each  observation  was  defined  as  resulting  in  an  "occurrence"  or 
"nonoccurrence"  by  defining  p^  such  that 


-9- 


Simulations  Simulations  Simulations 


A1  and  A2 

El  and  B2 

Cl  and  C2 

fi  -1"  9*°l 

£  (  0.3  / 

„  -30.0 1 

£  =l  i.i 

„  -54.0 

£  2.0 

i 

Xi 

pi 

pi 

pi 

1 

18.8 

0.034 

0.000 

0.000 

2 

22.0 

0.083 

0.003 

0.000 

3 

22.6 

0.098 

0.006 

0.000 

4 

23.5 

0.125 

0.015 

0.001 

5 

23.7 

0.131 

0.019 

0.001 

6 

24.3 

0.153 

0.037 

0.004 

7 

25.2 

0.192 

0.093 

0.027 

8 

25.5 

0.206 

0.125 

0.047 

9 

26.9 

0.283 

0.399 

0.450 

10 

27.2 

0.302 

0.480 

0.599 

11 

27.6 

0.327 

0.589 

0.769 

12 

28.1 

0.361 

0.713 

0.900 

13 

28.3 

0.375 

0.756 

0.931 

14 

28.6 

0.397 

0.811 

0.961 

15 

28.8 

0.411 

0.843 

0.973 

16 

29.7 

0.478 

0.935 

0.996 

17 

30.9 

0.567 

0.982 

1.000 

18 

31.2 

0.589 

0.987 

1.000 

19 

31.3 

0.596 

0.988 

1.000 

20 

31.7 

0.625 

0.992 

1.000 

21 

31.7 

0.625 

0.992 

1.000 

22 

32.4 

0.673 

0.996 

1.000 

23 

32.6 

0.686 

0.997 

1.000 

24 

33.0 

0.711 

0.998 

1.000 

25 

33.5 

0.741 

0.999 

1.000 

Figure  1:  Values  of  Parameter  Vector  (j$) ,  Predictor  Variable  (x^  and  Response 
Probabilities  (P^)  for  Initial  Designs  Used  in  Simulation  Study. 


(ii)  The  parameters  were  then  estimated  by  inputting  the  simulated 
sample  into  a  program  developed  by  Jones  [4]  which  solves  for 
the  maximum  likelihood  estimates.  (An  alternate  program  by 
Walker  and  Duncan  [11],  which  uses  the  weighted  least  squares 
approach,  would  have  yielded  nearly  the  same  estimates;  the  two 
methods  are  asymptotically  equivalent.) 

A 

(iii)  Using  the  estimated  parameters,  li  was  calculated  and  a  numerical 

A 

technique  was  used  to  maximize  Ix'HXj  over  choices  of  x  ... 

i - 1  n  +  1 

Here  X.  is  the  (n  +  1)  x  2  augmented  design  matrix  with  the 
s  t 

(n  +  1) —  row,  x' ^  representing  the  conditions  for  the 

A 

approximately  optimal  new  design  point;  H  is  the  (n  +  1)  x  (n  +  1) 

estimated  weight  matrix  with  (n  +  l)il£  weight,  PR  +  jQn  +  ^  DA^  +  ^ 

was  calculated  using  the  approximately  optimal  new  design  point 
s  t 

as  the  (n  +  1) —  point. 


(iv)  Step  (iii)  was  repeated  using  the  true  parameter  and  weights. 

DE  was  calculated  using  the  specified  exactly  optimal  new 

n  +  1 

design  point  as  the  (n  +  1)—  point. 

(v)  The  approximately  optimal  new  design  point  (from  step  (iii))  was 
added  to  the  design  and  its  observation  was  simulated  using  the 
technique  described  in  step  (i).  Note  that  the  design  was  augmented 


-11- 


with  the  approximately  optimal  point  but  its  observation  was 
simulated  using  the  true  parameters.  This  corresponds  to  a 
real-life  situation  where  the  experimenter  can  only  approxi¬ 
mate  the  opti^'il  new  design  point  but  the  probability  of 
response  for  that  point  follows  the  true  (unknown)  logistic 
function. 


(vi)  Simulations  Al,  B1  and  Cl  only:  Treating  the  augmented  design 
as  the  new  "initial"  design  (resetting  the  sample  size,  n,  to 
n  +  1  each  time  a  point  was  added),  steps  (ii)-(v)  were 
repeated  24  more  times  so  that  the  final  design  had  50  points. 

(vii)  Simulations  A2,  B2  and  C2  only:  Using  the  estimated  parameters 
from  (ii),  steps  (iii)  and  (iv)  were  repeated  four  more  times 
by  treating  the  augmented  design  as  the  new  initial  design 
(resetting  the  sample  size,  n,  to  n  +  1  each  time  a  point  was 
added.)  Thus,  a  new  block  of  five  points  was  added  without 
reestimating  the  parameters. 

(viii)  Simulations  A2,  B2  and  C2  only;  Steps  (ii)-(v)  and  (vii)  were 
repeated  four  more  times  so  that  the  final  design  had  50  points. 

(ix)  In  order  to  estimate  the  expected  values  of  and  Rn,  and  to 
construct  confidence  intervals,  steps  (i)-(viii)  were  repeated 
120  times. 


B.  RESULTS 

For  each  simulation,  expected  values  of  DAn  and  were  estimated  by 


-12- 


averaging  results  from  the  120  replications.  Plots  of  DAn  versus  n  are 
shown  in  Figure  2  for  simulations  A1  and  A2,  in  Figure  3  for  simulations 
B1  and  B2  and  in  Figure  4  for  simulations  Cl  and  C2.  Also  included  are 
approximate  95%  confidence  intervals  for  n  =  35  and  50.  The  analogous 
plots  and  confidence  intervals  for  Rn  are  presented  in  Figure  5  for  simula¬ 
tions  A1  and  A2,  in  Figure  6  for  simulations  B1  and  B2  and  in  Figure  7  for 
simulations  Cl  and  C2. 

It  seems  reasonable  to  believe  that  the  augmentation  procedure  would 
yield  higher  D-ef ficiencies  when  the  parameters  are  reestimated  before  each 
new  point  is  added  than  when  points  are  added  in  blocks  of  more  than  one. 
However,  in  the  cases  considered  (see  Figures  2,  3  and  4)  the  size  of  the 
blocks  did  not  have  a  significant  effect  on  efficiency  DAq.  This  indicates 
that  from  a  practical  standpoint,  very  little  is  lost  by  augmenting  in  blocks 
of  five  rather  than  one.  In  addition,  the  high  relative  efficiencies  in 
Figures  5,  6  and  7  reveal  that  the  loss  in  efficiency  due  to  the  need  to 
use  estimated  parameter  values  is  minimal.  Another  observation  to  be  made 
from  Figures  5,  6  and  7  is  that  for  poorer  initial  designs  there  seems  to 
be  a  "start-up"  phase  before  expected  relative  efficiency  starts  increasing. 


-13- 


D-efficiency  Versus  Sample  Size  For  S inula t ions  B1  and  B2; 
(80  ■  -30.0,  B,  ■  1.1).  95X  Confidence  Intervals  Shown  for 

n  -  35  and  50. 


Figure  4:  D- efficiency  Versus  Sample  Size  for  Simulations  Cl  and  C2; 

(30  •  -54.0,  Bi  *  2.0).  95%  Confidence  Intervals  Shown  for 

n  -  35  and  50. 


-16- 


R 


n 


.99 


.98 


.97 


J 


Blocks  of  Size  1 
-  (Simulation  Bl) 

Blocks  of  Size  5 
(Simulation  B2) 


.95  J — | - 1 — | - 1 — \ - , - 1 - ( - ( — I - 1 — i - 1 — ( - 1 — | - 1 - , — l — | — I - 1 — » — | - f 


25 


30  35 


40 


45 


Figure  6:  Relative  Efficiency  Versus  Sample  Size  For  Simulations  Bl  and  B2; 

(80  ■  -30.0,  8i  ■  1.1).  952  Confidence  Intervals  Shown  for  n  ■  35 

and  50. 


-18- 


V.  EXAMPLES 


Three  examples  will  be  taken  from  the  previously  mentioned  data  set  of 
twenty-eight  -G  accelerator  runs.  (See  [8]  and  [10].)  In  each  case,  the 
next  five  design  points  (one  block  of  five)  will  be  specified  and  the 
D-efficiencies  after  each  new  run  will  be  estimated.  If  the  experimenter 
chooses  to  work  in  blocks  of  size  smaller  than  five  (say  three,  for  example)  , 
he  or  she  would  make  only  the  first  three  runs  specified  and  then  stop  to 
reestimate  the  parameters  and  specify  three  new  runs. 


A.  TWO-PARAMETER  CASE 

Model  A  from  [10]  used  peak  force  along  the  anatomical  Z  axis  (denoted 
FHC)  as  the  predictor  of  injury.  The  model,  as  estimated  from  the  initial 
28  runs,  was  given  by 

P (FHC)  -  (1  +  exp{-[-7 .3705  -  . 1072 (FHC) ] })_1 

The  existing  design  has  an  estimated  D-efficiency  of  .2347  and  the  next 
five  design  points  (one  block  of  five)  are  listed  in  Figure  8(a)  along  with 
the  estimated  D-efficiencies  after  each  point  is  added  to  the  design. 

For  the  second  example,  consider  model  3  from  [8]  in  which  peak  sled 
acceleration  (denoted  z^  )  was  the  predictor  of  injury.  The  estimated  model 
was  given  as 

P(zx)  ■  (l  +  exp[-(-49.81  +  .4472  z^]}  1 

For  this  case  the  existing  design  has  an  estimated  D-efficiency  of  .0134  and 
the  next  five  design  points  are  listed  in  Figure  8(b)  along  with  the  estimated 


1 


Estimated 


n 

FHC  =  x_ 
- n 

D-eff  iciei 

29 

-78.9 

.2592 

30 

-78.9 

.2801 

31 

-78.9 

.2977 

32 

-57.8 

.3140 

33 

-80.4 

.3298 

Figure  8(a):  One  Block  of  Five  Additional  Design  Points  for 

Estimating  Probability  of  Injury  From  Peak  Force 
Along  the  Anatomical  Z-Axis  (FHC) . 


Estimated 


_n 

2  2  xn 

D-eff iciency 

29 

114.9 

.0339 

30 

114.9 

.0517 

31 

114.9 

.0672 

32 

114.9 

.0807 

33 

107.9 

.0927 

3(b): 

One  Block  of  Five  Additional  Design  Points 
Estimating  Probability  of  Injury  From  Peak 
Sled  Acceleration  (z^). 

-21- 


D-eff iciencies  after  each  point  is  added  to  the  design. 


B.  THREE-PARAMETER  CASE 


The  following  example  will  illustrate  one  approach  for  handling 
problems  which  arise  in  the  three-parameter  case.  Model  (7)  from  [8]  used 
peak  sled  acceleration  (z^)  and  peak  head  angular  velocity  (x^)  to  predict 
injury.  The  estimated  model  was  given  as 

*  -1 
P^,  x3)  =  {1  +  exp [-(-248. 6  +  2.009  z1  +  0.11938  x3)]}  . 

As  previously  mentioned,  when  more  than  one  predictor  variable  is  being  used, 
there  is  a  need  to  restrict  the  design  space  so  that  the  optimal  points  do 
not  diverge  to  infinite  values. 

For  this  example,  an  assumption  about  the  nature  of  the  predictor 

variables  was  made  which  served  to  place  probabilistic  constraints  on  the 

design  space.  It  was  assumed  that  only  the  peak  sled  acceleration  (z^)  is 

controllable  and  that  peak  head  angular  velocity  (x3)  is  related  to  z^  by 

an  equation  of  the  form  X3  =  <Xq  +  a^z^  +  e  where  e  has  a  normal  distribution 

2  A  A 

with  mean  0  and  variance  c  .  The  estimates  aQ  =  -2.57,  =  1.312,  cr  *  3724.4 

were  obtained  from  the  data  by  least  squares  estimation. 

To  describe  the  augmentation  procedure  for  this  situation,  it  is  con¬ 
venient  to  introduce  some  additional  notation.  Define  the  function  d(a,  b) 
to  equal  the  determinant  of  X'HX  after  augmenting  an  existing  design  with  new 
design  point  (a,  b).  Then  the  optimal  value  of  is  that  value  which  maximizes 
E[d(z^,  *3)]*  where  x3  »  +  a^z^  +  e.  Here,  the  expectation  is  taken  over 

e;  that  is, 

00 

EldCzj^,  x3)]  -  EldCzp  &q  +  +£)]»/  d^,  aQ  +  +  e)f(e)de, 


-22- 


.  U  2  2 

where  f (e)  =  [l/(2ir)aa]  exp[-(e  /2a )]  is  the  probability  density  function 
of  e.  After  each  new  value  of  z^  is  specified,  the  expected  new  point 
(z^,  E(x^))  is  added  to  the  existing  design.  Then  the  next  point  is  speci¬ 
fied  using  the  newly  augmented  design  as  the  "existing  design." 

After  each  new  block  of  points  is  specified  in  this  way,  the  new  runs 
can  be  made.  The  resulting  data  is  then  used  to  replace  the  values  of  E(x^) 

A  A  A  A  2  A 

with  their  actualizations  and  to  update  the  estimates  cx^,  a^,  a  and  H. 

Preliminary  calculations  for  this  particular  example  indicated  that  with 

A  A 

j5  “  (-248.6,  2.009,  0.11938)’,  the  function  P(z^,  x^)  takes  on  values  very 

close  to  0.0  or  1.0  at  all  but  an  extremely  narrow  band  of  points  in  the 

ZjXyplane.  As  a  result,  a  very  high  degree  of  precision  needs  to  be 
carried  through  all  calculations  to  avoid  numerical  inaccuracies.  The 
narrow  band  of  non-trivial  points,  the  relatively  large  standard  deviation 

A  A 

about  the  estimated  regression  line  X3  =  +  a^z^,  and  the  high  degree  of 

precision  required  in  specifying  optimal  new  design  points  combine  to  make 
this  example  a  poor  illustration  of  the  augmentation  procedure.  Thus,  a 

A 

different  (fabricated)  parameter  vector  =  (-5.26,  .02,  .02)’  was  used  for 
illustration  purposes. 

Using  the  fabricated  parameter  vector,  the  normalized  determinant  for 
the  Initial  design  is  1928.75.  (Note  that  D-efficiency  cannot  be  estimated 
because  the  D-optimal  design  for  the  three-parameter  case  is  not  known.) 

A  block  of  five  values  of  z^  is  listed  in  Figure  9  along  with  their  correspond¬ 
ing  values  of  E(x^)  *=  -2.57  +  1.312  z^  and  the  estimates  of  the  new  expected 
normalized  determinants,  E[d(z^,  x^)]. 


-23- 


_n 

% 

E(x3) 

Estimated 

Normalized  Determinant 

29 

157.66 

204.29 

2201.66 

30 

157.07 

203.51 

2372.31 

31 

156.50 

202.76 

2499.36 

32 

155.96 

202.05 

2591.25 

33 

155.44 

201.37 

2654.74 

Figure  9:  One  Block  of  Five  Additional  Design  Points 
for  Estimating  Probability  of  Injury  From 
Peak  Sled  Acceleration  (z  )  and  Peak  Head 
Angular  Velocity  (x^) . 


-24- 


ii  »n«lTn  il  l  II  llliillillliiMlMiMiMi 


VI .  SUMMARY 


A  procedure  for  optimal  augmentation  of  an  experimental  design  was 
presented.  In  particular,  the  augmentation  procedure  was  applied  to  the 
problem  of  estimating  the  two-parameter  logistic  function.  Since,  for 
this  case,  the  optimality  criterion  is  dependent  on  the  unknown  parameter 
values,  parameter  estimates  need  to  be  used  and  updated  periodically. 

A  simulation  study  was  conducted  to  evaluate  the  augmentation  procedure. 
It  was  found  that  updating  the  parameter  estimates  after  each  new  design 
point  is  added  does  not  result  in  significantly  higher  efficiencies  than 
updating  the  estimates  only  after  each  block  of  five  new  points  is  added. 

It  was  also  found  that  the  loss  in  efficiency  due  to  the  need  to  use 
estimated  parameter  values  rather  than  the  true  values  is  minimal. 


-25- 


VII.  REFERENCES 


[1]  Draper,  N.  R.,  and  St.  John,  R.  C.,  "Models  and  Designs  for  Experiments 
With  Mixtures:  I,  Background  Material,"  Technical  Report  No.  360, 

Department  of  Statistics,  University  of  Wisconsin,  1975. 

[2]  Draper,  N.  R.,  and  St.  John,  R.  C.,  "Designs  in  Three  and  Four  Components 
For  Mixture  Models  With  Inverse  Terms,"  Technometrics.  Vol.  19,  pp.  117- 
130,  1977. 

[3]  Dykstra,  0.  Jr.,  "The  Augmentation  of  Experimental  Data  to  Maximize  |x’x|," 
Technometrics.  Vol.  13,  pp.  682-688,  1971. 

[4]  Jones,  R.  H. ,  "Probability  Estimation  Using  a  Multinomial  Logistic  Function, 
J.  Statist.  Comput.  Simul.,  Vol.  3,  pp.  315-329,  1975. 

[5]  Kalish,  L.  A.,  and  Smith,  D.  E.,  "Optimal  Designs  for  Estimation  of  the 
Two-Parameter  Logistic  Function,"  Technical  Report  No.  112-4,  Desmatics, 
Inc.,  1980. 

[6]  Kiefer,  J.,  "Optimum  Experimental  Designs,"  J.  Royal  Statist.  Soc.  -  Ser.  B, 
Vol.  21,  pp.  272-319,  1959. 

[7]  Smith,  D.  E.,  "Research  on  Construction  of  a  Statistical  Model  for  Predict¬ 
ing  Impact  Acceleration  Injury,"  Technical  Report  No.  102-2,  Desmatics,  Inc. 
1976. 

[8]  Smith,  D.  E.,  "An  Examination  of  Statistical  Impact  Acceleration  Injury 
Prediction  Models  Based  on  -Gx  Accelerator  Data  from  Subhuman  Primates," 
Technical  Report  No.  102-6,  Desmatics,  Inc.,  1978. 

[9]  Smith,  D.  E.,  and  Gardner,  R.  L.,  "A  Study  of  Estimation  Accuracy  When 
Using  a  Logistic  Model  for  Prediction  of  Impact  Acceleration  Injury," 
Technical  Report  No.  102-5,  Desmatics,  Inc.,  1978. 

[10]  Smith,  D.  E.,  and  Peterson,  J.  J.,  "An  Examination  of  Statistical  Impact 
Acceleration  Injury  Prediction  Models  Based  on  Torque  and  Force  Variables," 
Technical  Report  No.  112-1,  Desmatics,  Inc.,  1979. 

[11]  Walker,  S.  H.,  and  Duncan,  D.  B,,  "Estimation  of  the  Probability  of  an 
Event  as  a  Function  of  Several  Independent  Variables,"  Biometrika,  Vol. 54, 
pp.  167-179,  1967. 


-26- 


Unclassified _ 

SECURITY  CLASSIFICATION  of  This  page  (When  Dote  Fntered) 

1  REPORT  DOCUMENTATION  PAGE 


1  REPORT  NUMBER  [2.  OOVT  ACCESSION  NO.| 

112-5  _ i  1 tef 

4.  title  fond  Submit)  f 

1  J3PTIMAL  AUGMENTATION  OF  EXPERIMENTAL  DESIGNS  FOR' 
ESTIMATION  OF  THE  LOGISTIC  FUNCTION  #  / 


I?  AUTHOR^#) 


•i'K-  -i 

READ  INSTRUCTIONS 

_ BEFORE  COMPLETING  FORM 

"T  RECIPIENT'S  catalog  number 


S.  BP  RgPO  R Y  4-EERlOO  COVERED 

(^Technical  /Xeym 1 ,  J/) 

t  PERFORMING  ORG.  REPORT  NUMBER 


8.  CONTRACT  OR  GRANT  NUMBERf*J 


Leslie  A./Kalish  J&i  Dennis  E. /Smith'/ 


*.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 


N06014- 

N0f»14- 


14-79-C-0125j' 


75-01054 


Desmatics,  Inc. 

P.  0.  Box  618 

State  College.  PA  168Q1 


H.  CONTROLLING  OFFICE  NAME  ANO  ADDRESS 


Office  of  Naval  Research 
Arlington,  VA  22217 


.  MONITORING  AGENCY  NAME  8  AOORESS  (It  dlllttont  from  Control  lint  O(lice) 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  &  WORK  UNIT  NUMBERS 

NR  2Q7-037 
NR  042-334 


vilMH.l.U  Jnrrr 


IS.  SECURITY  CLASS,  (ol  Ihl,  report) 

Unclassified 


IS*.  DECLASSIFICATION  'DOWNGRADING 
SCHEDULE 


■  DISTRIBUTION  STATEMENT  (ol  Ihl,  Report) 


Distribution  of  this  report  is  unlimited. 


•  7.  0*STRl8uTI0N  STATEMENT  (of  tho  ebetrect  entered  In  Block  20,  It  different  from  Report) 


*9.  KEY  WORDS  (Contln 


mry  end  Identify  by  block  number) 


Logistic  Function 
Estimation 

Optimal  Augmentation 

20  ABSTRACT  (Continue  on  rover »e  tide  II  neceeeery  end  Identity  by  block  number) 

'S  A  criterion  for  optimal  augmentation  of  an  experimental  design  is  applied 
to  the  problem  of  estimating  the  logistic  function.  A  simulation  study  is 
conducted  to  evaluate  the  procedure  in  the  two-parameter  case.  Examples 
in  the  development  of  impact  acceleration  injury  prediction  models  are  given. ^ 


00  I^sr,,  1473  EDITION  OF  l  NOV  8S  IS  OBSOLETE 


7  fJ  j  — ...  ...^ 

Unclassified 

SECURITY  CL  ASMFICAtTon  or  Tills  P»or  (^herTooto  Ent*rt J 


